-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG:fix rolling skew kurt floating issue #18065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks, the failure on Travis is unrelated. Could you add a note to whatsnew/v0.21.1.txt and a unit test (maybe using the example from #18044)? |
Hello @ogotaiking! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on November 07, 2017 at 13:51 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #18065 +/- ##
==========================================
- Coverage 91.25% 91.23% -0.02%
==========================================
Files 163 163
Lines 50120 50120
==========================================
- Hits 45737 45728 -9
- Misses 4383 4392 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18065 +/- ##
==========================================
- Coverage 91.25% 91.23% -0.02%
==========================================
Files 163 163
Lines 50124 50124
==========================================
- Hits 45742 45733 -9
- Misses 4382 4391 +9
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -57,6 +57,8 @@ Documentation Changes | |||
Bug Fixes | |||
~~~~~~~~~ | |||
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`) | |||
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
producing numerical unstable results rather than NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original behavior for skewness and kurtosis calculation is :
if the {Xi} distribution is a uniform distribution ( all values are equal), Thus the variance of the values( Sum(Xi-Xmean) ^2 is zero. so B should be zero in this case. But some numerically unstable issue cause B not to be zero.
so consider this situation, we assume if {Xi}'s variance is less than 1e-14, we treat it as uniform distribution.
based on previous behavior is return Nan. I don't want to break this behavior.
Some reference :
from mathematica, for uniform distribution skew/kurt return Nan
http://www.wolframalpha.com/input/?i=skewness%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D
http://www.wolframalpha.com/input/?i=kurtosis%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D
pandas/_libs/window.pyx
Outdated
@@ -788,7 +788,7 @@ cdef inline double calc_skew(int64_t minp, int64_t nobs, double x, double xx, | |||
A = x / dnobs | |||
B = xx / dnobs - A * A | |||
C = xxx / dnobs - A * A * A - 3 * A * B | |||
if B <= 0 or nobs < 3: | |||
if B <= 1e-14 or nobs < 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here about what we are checking (and below)
pls rebase on master as well. |
@jreback for the comments for return Nan for numerically unstable issue[ seems lost the comments during rebase] consider the uniform distribution {Xi}, X1=X2=....Xn-1=Xn, the variance sum(Xi-Xm)^2/n should be Zero, but some floating issue cause the variance is in range(0,1e-14), and the variance is in the denominator, that will cause skew/kurt result with an extremely large number. so we can treat if the variance is less than 1e-14, it's uniform distribution. The original code only checks if the variance == Zero, and return Nan for uniform distribution, so I don't want to break this behavior and leave the result as Nan. some reference : result is Nan(undefined) |
@ogotaiking not sure of your last, is this a different case? |
@jreback same case. and also a reference from original skewness calculation code pandas/core/nanops.py def nanskew(values, axis=None, skipna=True):
...
# floating point error
m2 = _zero_out_fperr(m2) <- here m2 is eq B in rolling_skew
m3 = _zero_out_fperr(m3)
with np.errstate(invalid='ignore', divide='ignore'):
result = (count * (count - 1) ** 0.5 / (count - 2)) * (m3 / m2 ** 1.5) and the function _zero_out_fperr _zero_out_fperr(arg):
if isinstance(arg, np.ndarray):
with np.errstate(invalid='ignore'):
return np.where(np.abs(arg) < 1e-14, 0, arg)
else:
return arg.dtype.type(0) if np.abs(arg) < 1e-14 else arg so here we treat the B <1e-14 is eq 0 in calc_skew and calc_kurt is following the original skew/kurt behavior. |
@ogotaiking ok I c. can you add update the comment in the code above to include a refernce to _zero_fperr (and maybe a reference in zero_fperr to calc_kurt/calc_skew), just so that future readers will know that these are connected. otherwise lgtm. ping when pushed / green. |
# cause B != 0. and cause the result is a very | ||
# large number. | ||
# | ||
# in core/nanops.py nanskew/nankurt call the function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the reference to _zero_fperr as well (back to here)
lgtm merge on green |
thanks @ogotaiking |
(cherry picked from commit 93c755e)
(cherry picked from commit 93c755e)
git diff upstream/master -u -- "*.py" | flake8 --diff